Registered S3 method overwritten by 'tsibble':
method from
as_tibble.grouped_df dplyr
Attaching package: 'tsibble'
The following object is masked from 'package:lubridate':
interval
The following objects are masked from 'package:base':
intersect, setdiff, union
library(feasts)
Loading required package: fabletools
Attaching package: 'fabletools'
The following object is masked from 'package:yardstick':
accuracy
The following object is masked from 'package:parsnip':
null_model
The following objects are masked from 'package:infer':
generate, hypothesize
library(dataRetrieval)library(plotly)
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
Data Import
# Example: Cache la Poudre River at Mouth (USGS site 06752260)poudre_flow <-readNWISdv(siteNumber ="06752260", # Download data from USGS for site 06752260parameterCd ="00060", # Parameter code 00060 = discharge in cfs)startDate ="2013-01-01", # Set the start dateendDate ="2023-12-31") |># Set the end daterenameNWISColumns() |># Rename columns to standard names (e.g., "Flow", "Date")mutate(Date =yearmonth(Date)) |># Convert daily Date values into a year-month format (e.g., "2023 Jan")group_by(Date) |># Group the data by the new monthly Datesummarise(Flow =mean(Flow)) # Calculate the average daily flow for each month
# Plotting Data Series poudre_plot <- poudre_components |>autoplot() +labs(title ="STL Decomposition of Poudre Flow",y ="Flow (cfs)", x ="Date") +theme_minimal()poudre_plot
# Animating the Plotpoudre_plotly <-ggplotly(poudre_plot)poudre_plotly
Visualizing Seasonal Patterns
poudre_subseries <- tibble_poudre |>gg_subseries(Flow) +labs(title ="Seasonal Subseries Plot of Poudre River Flow",y ="Average Flow (cfs)", x ="Month") +theme_minimal()poudre_subseries
Data Analysis
With each of the time series plots, we can clearly see that there is a spike in the flow of the gauge around May and June. This makes sense as this is when the heaviest amount of rain is typically found in Northern Colorado. There is also the fact that over the years, there is an overall decrease in the amount of flow, meaning that there is less precipitation to help initiate flow. The sub-series allows for the data to be sorted by the months rather than a linear, and shows if there is any outliers that may screw the analysis of the seasons. We can see this specifically in September and slightly in April, but with all of the data sorted by month, we can see the average flow for each month and easily find the season with the highest amount of flow.
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
Warning: ✖ Expecting the following names to be in the data frame: .conf_hi, .conf_lo.
ℹ Proceeding with '.conf_interval_show = FALSE' to visualize the forecast without confidence intervals.
Alternatively, try using `modeltime_calibrate()` before forecasting to add confidence intervals.
Importing the Actual Streamflows
poudre_flow_2024 <-readNWISdv(siteNumber ="06752260", # Download data from USGS for site 06752260parameterCd ="00060", # Parameter code 00060 = discharge in cfs)startDate ="2024-01-01", # Set the start dateendDate ="2024-12-31") |># Set the end daterenameNWISColumns() |># Rename columns to standard names (e.g., "Flow", "Date")mutate(Date =yearmonth(Date)) |># Convert daily Date values into a year-month format (e.g., "2023 Jan")group_by(Date) |># Group the data by the new monthly Datesummarise(Flow =mean(Flow)) # Calculate the average daily flow for each month
Warning: There was 1 warning in `filter()`.
ℹ In argument: `.index >= yearmonth("2024 Jan")`.
Caused by warning:
! Incompatible methods (">=.Date", ">=.vctrs_vctr") for ">="
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
Fitting the Linear Model
# For one model at a time (example: Prophet model)prophet_comparison <- comparison_tbl %>%filter(.model_desc =="PROPHET")# Fit linear model: actual vs predictedlm_prophet <-lm(actual_flow ~ .value, data = prophet_comparison)summary(lm_prophet)$r.squared
# Compute R² for both modelslm_prophet <-lm(actual_flow ~ .value, data = comparison_tbl %>%filter(.model_desc =="PROPHET"))lm_arima <-lm(actual_flow ~ .value, data = comparison_tbl %>%filter(.model_desc =="UPDATE: ARIMA(0,0,2)(0,1,1)[12]"))r_squared_prophet <-summary(lm_prophet)$r.squaredr_squared_arima <-summary(lm_arima)$r.squared# Plot for both modelscomparison_tbl %>%ggplot(aes(x = .value, y = actual_flow, color = .model_desc)) +geom_point(size =3) +geom_abline(intercept =0, slope =1, color ="black", linetype ="dashed", size =1) +geom_smooth(method ="lm", se =FALSE) +annotate("text", x =max(comparison_tbl$.value) *0.7, y =max(comparison_tbl$actual_flow) *0.9, label =paste("R² (Prophet) = ", round(r_squared_prophet, 3)), color ="blue", size =5) +annotate("text", x =max(comparison_tbl$.value) *0.7, y =max(comparison_tbl$actual_flow) *0.85, label =paste("R² (ARIMA) = ", round(r_squared_arima, 3)), color ="red", size =5) +labs(title ="Predicted vs Observed Streamflow (All Models)",x ="Predicted Flow (cfs)",y ="Observed Flow (cfs)",color ="Model" ) +theme_minimal()
`geom_smooth()` using formula = 'y ~ x'
Model Analysis
While both models have high R2s, when compared in a linear model, they look to be far from the observed flows. Arima has a higher R2 which details a more accurate model. The peak of the observations were far off from the observed flow, which matches with the comparison of the models to the observed flow from 2024. However, it was able to predict the periods where the flow would spike. Overall, Arima is the best model for this prediction and created a fairly accurate read.